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Abstract —  We  introduce  a  novel  generalization  of  the  compound 
Gaussian  (CG)  (or  Gaussian  Scale  Mixture  [1])  distribution 
which  extends  the  Gaussian  component  of  the  CG  model  to  a 
multilinear  distribution.  The  resulting  model,  which  we  call  the 
Multilinear  Compound  Gaussian  (MCG)  distribution,  subsumes 
both  GSM  [1]  and  the  previously  developed  MICA  [3-4] 
distributions  as  complementary  special  cases;  thereby  allowing 
us  to  model  a  richer  class  of  stochastic  phenomena.  First  we 
derive  the  structural  characterization  of  the  MCG  distribution 
and  develop  some  of  its  important  theoretical  properties. 
Thereafter  we  describe  a  parameter  estimation  algorithm  for 
learning  this  model  from  sample  data,  and  then  deploy  this  for 
modeling  textures,  including  natural  (i.e.  optical)  and  SAR 
images.  Our  simulation  results  demonstrate  how,  for  each  case, 
we  obtain  improved  performance  over  the  CG  model;  thus 
indicating  the  versatility  of  the  MCG  model  in  accurately 
modeling  various  natural  phenomena  of  interest. 
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I.  Introduction 

The  compound  Gaussian  (CG)  model — also  known  as  the 
Gaussian  Scale  Mixture  or  GSM  [1]  in  the  image  processing 
community — is  a  well  known  statistical  distribution  that  has 
been  employed  as  a  powerful  low-level  prior  model  for 
important  applications  such  as  the  denoising  [2]  of  natural 
images.  In  the  radar  community  too,  CG  is  known  to  be  an 
accurate  statistical  model  for  the  wavelet  distribution  of  SAR 
images  [5]  with  applications  such  as  image  reconstruction  [6], 
and  also  forms  the  basis  of  the  K-distribution  [7-8]  that  is 
frequently  used  for  modeling  RCS  returns  from  sea  clutter 
data.  Indeed  CG  is  a  very  versatile  statistical  model  that 
specializes  to  many  well  known  distributions  such  as  the  01- 
stable  and  symmetrized  Gamma  distributions  [1]  and  which 
serve  as  useful  prototypes  of  heavy-tailed  processes — thus 
providing  the  probabilistic  underpinnings  for  the  generation  of 
various  sparse-structured  stochastic  phenomena  such  as 
described  above. 

Consider  a  random  variable  X  that  can  be  decomposed  into 
the  following  product  form  (pointwise  product  of  2  vectors): 

X  =  S  ■  B  (1) 

The  probability  density  function  ofXis  thus  given  by: 

Px CO  =  f  PxWB  =  p)PB  (/?)  ■  dp  (2) 


=  f±ps(x/pypB(p)-dp  (3) 

where  Z  is  a  normalizing  constant  for  Ps(x//?). 

In  the  special  case  where  S~N(g,'Z')  is  a  Gaussian 
random  vector  with  mean  /i  and  covariance  matrix  X,  and  B  is 
a  non-Gaussian  random  vector  consisting  of  i.i.d.  components, 
Ain  (1)  is  said  to  CG  distributed  random  variable: 

px 00  =  f  exp  (-((x  -  - 

(0//0)  PB(P) 1  dp 
(4) 

The  problem  that  we  are  concerned  with  in  this  paper  is 
the  structure  and  properties  of  (1)  when  the  conditional  density 
Px(x\B  =  /?)  in  (2)  is  replaced  by  a  more  general  multilinear 
distribution  of  the  following  form: 

Px(x\B  =  P)  =  ^gVOUiPiOcd  (5) 

where,  X  =  \xt,  ...,xd\.  The  probability  densities 
generalize  Gaussian  random  variables  in  a  manner  specified 
below,  while  the  multilinear  term  g(X)  captures  the  higher- 
order  statistical  interactions  between  the  components  ofX. 

In  Section  2,  we  introduce  a  non-linear  system  model  that 
serves  as  the  basis  of  our  theoretical  development  of  the  MCG 
model  and  its  properties.  We  will  see  how  the  MCG  model 
developed  subsumes  both  CG  and  the  previously  developed 
multilinear  ICA  (MICA)  distribution  [3-4]  as  complementary 
special  cases.  Section  3  describes  an  algorithm  for  estimating 
the  parameters  of  the  MCG  model.  We  conclude  by  providing 
simulation  results  demonstrating  the  performance  of  the  MCG 
distribution  in  Section  4  for  Texture,  Optical  and  SAR  images. 

II.  Multilinear  Compound  Gaussian  Model 

We  first  describe  the  construction  of  the  MCG  distribution  in 
terms  of  a  novel  non-linear  system  model.  Thereafter  in 
section  II-B  we  present  some  theoretical  properties  of  the 
MCG  model. 

A.  Structure  of  the  MCG  Distribution 

Let  x  =  [xr,  x2, ... ,  xd]T  £  be  an  observed  random 
vector  distributed  according  to  the  MCG  model  (5).  We  now 
demonstrate  how  (5)  can  be  synthesized  via  the  generative 
model  shown  in  Figure  1.  The  system  F  in  Figure  1  consists  of 
a  core  non-linearity  q>  preceded  by  a  linear  system  y  =  As  + 
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T,  where  y  =  [y1(y2,  ,ydY  e  and 

s  =  [Si.Sj, ...  ,  sd]r  6  M.d  is  a  Gaussian  random  vector 
consisting  of  i.i.d.  components  s;~iV(0,l)  whose  density  is 
denoted  by  q(s;).  Vectors  T  =  [r1(  T2, ... ,  Td]T  and  a  = 
[o^,  < J2,  ... ,  ad]T  respectively  determine  the  mean  and  variances 
for  the  various  Gaussian  channels.  Matrix 
C  =  [C1(  C2, ... ,  Cd]T  =  A-1  is  assumed  to  be  invertible  and 
thus  determines  the  interaction  between  the  various  Gaussian 
sources.  Vector  /?  =  [ft,/?2<  ■■■,Pd.Y  e  is  an  instantiation 
of  random  variable  B  of  the  MCG  model  corresponding  to  (1). 
The  constants  y1  and  y2  play  an  important  role  in  determining 
the  properties  of  the  non-linear  transformation  cp,  as  we  show 
in  the  next  section.  All  the  operations  in  system  F,  other  than 
the  action  of  matrix  A,  are  pointwise  in  the  components  of  the 
vector. 

Given  this,  an  important  observation  is  that  for  a  given 
B  =  /?,  pointwise  invertibity  of  the  non-linearity  (p  results  in  a 
multilinear  distribution  of  the  following  form  (a  full  derivation 
is  given  in  the  extended  version  of  this  paper): 

(6> 

where  pk  (ff)  =  Kkexp  (  -ft,,  iff)  -  <4]  ) 

Kk  is  a  normalizing  constant,  ak  =  (1/2)  £f=1(C2k/ <Jk ) 


Lemma  1:  Let  yjM^s)  =  Xx  and  y2u2(s)  =  X2  such  that 
X1,X2  are  real  constants.  Then  cp  in  Equation  (6)  is  invertible 
if  and  only  if  XxX2  >  0. 

Proof: 

We  consider  first  the  converse  case  i.e.  assume  X1X2  >  0: 
Case  1:  21  >0  and  X2  >  0.  Then  equation  (6)  becomes: 

1*1  =  [yiMIsKCcrs)  +  y2<T2|s|2ti2(ers)]/?  (since 
sgn{x)  =  sgn(os)) 

=>  \s\2a2(lX2  +  o(lX1\s\  —  \x\  =  0 

|  |  +  V(o-/?A1)2+4|x|q-2/?A2 

1  1  2  a2/?A2(s) 

Thus  the  only  feasible  solution  is: 

|  |  -aPAi+  V(<T/?Z1)2+4|a:|g2/;A2 
11  2 a2pX2 

Case  2:  Xx  <  0  and  X2  <  0 

In  this  case,  \x\sgn(x')  =  —  5ign(s)[y1|cr||s|u1((js)  + 
y2<J2|s|2U2((Js)]/? 

We  once  again  solve: 

\x\  =  [Ki  |rx||s|t/1(crs)  +  y2ff2|s|2ii2(ffs)]/? 
to  get  the  unique  solution  above  together  with  the 
relationship:  sgn(x)  =  —sgn(as'). 

Thus  in  both  cases  (p  is  invertible  thus  proving  sufficiency. 


and  ck  =  Tk  +  ■ 


Li,jci,kr 


i,k 

^k=l  _2 


~,<P  =  <P 


Now  assume  that  cp  is  invertible.  We  then  have  4  cases: 
Case  I :  Xx  >0  and  X2  >  0.  See  above. 

Case  2:  Xx  <  0  and  X2  <  0.  See  above. 


Also,  g(J )  =  exp 


where  Gy  =  Ek=1 


d  Cfc,iC  —  and  K 


■m -&M 


(27r)a/2  rifc=i  °kKk 


We  have  found  the  following  linear-quadratic  non-linearity 
to  effectively  capture  the  statistics  of  real  data: 

<p(s)  =  [y^au^as)  +  y2(pq(sa)u2(as)]p  (7) 

where,  functions  ut  and  u2  are  defined  in  the  next  section, 
(pq(s)  =  s2sgn(s'),  /?  >  0,  a  >  0,  and  sel. 


This  non-linearity  furnishes  a  natural  generalization  of  the 
Gaussian  distribution — in  particular  it  consists  of  both  a  linear 
and  quadratic  channel  and  reduces  to  a  multidimensional 
Gaussian  distribution  when  only  the  linear  channel  is  active. 


B.  Theoretical  Properties  of  MCG 

As  described  in  the  previous  section,  an  important 
consideration  is  the  characterization  of  the  conditions  under 
which  equation  (7)  is  invertible.  This  is  because  invertibility 
makes  it  possible  to  generate  a  multilinear  distributions 
whereas  in  general  one  will  obtain  a  mixture  distributions 
which,  though  more  general,  are  computationally  less 
tractable.  The  following  Lemma  gives  a  characterization  the 
invertibility  of  cp  in  (7)  for  the  case  where  d=  1  (which 
therefore  applies  to  the  invertibility  of  the  general  case  since  (p 
is  a  component- wise  transform). 


Case  3:  Xx  >  0  and  X2  <  0 

In  this  case  we  are  forced  to  solve  the  original  quadratic 
problem  wherein: 

X2flo2s2sgn(s)  +  X1fos  —  x  =  0 
To  solve  this,  we  consider  2  sub-cases: 

Case  3.1 :  s  >  0.  Then: 

X2fo2s2  +  Xxo  s  —  x  =  0 


_  —  cr/?Ai  +  /  (cr/?Z  i)2+4X(Tz/?A2 
^  5  ~  2 a2px2 

Since  s  >  0  by  assumption,  the  feasible  solutions  are: 

-apii- -7((t/?A1)2+4X(T2/?Z2 
2  rr2pA2 

(provided:4x<r2 fiX2  >  —(pfiXff2) 

 )2+4xrj2pA2 


S  = 


(8) 


and, 


2 a2pA2 

(provided:  4xa2pX2  <  0)  (9) 

Both  (7-8)  can  be  satisfied  if: 

-{oflXf)2  <  4xa2pX2  <  0  (10) 

Since  X2  <  0  by  assumption,  (10)  is  satisfied  for  any  x  <  0. 
Thus  in  general  there  can  be  2  solutions  to  equations  (6)  and 
so  (p  is  not  invertible  for  case  3.1. 


Case  3.2:  s  <  0.  Here  we  solve: 

—X2f]o2s2  +  X^os  —  x  =  0 

_  -apXp  ±  ^/((T/?Z1)2-4xcr2/?Z2  _  -(TpX1  ±  / 2  +4X(T2/?[A2| 

-2  (T2PX2  ~  2u2/?|A2| 

Here  the  only  feasible  solution  is: 
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_  -u/?Ai- V(ff/?Ai)2 3 4+4a:g-2/?|A2T  n 

2°2P\A2\  *' 

provided  that  Axo2p \ A2  |  >  —  (o'/?! i)2  (12) 

In  general  (11)  can  be  violated  by  choosing  x  accordingly 
and  thus  (p  is  not  invertible. 

Case  4:  !1  <0  and  !2  >  0.  By  a  similar  reasoning  as  applied 
to  Case  3.1,  (p  is  not  invertible  in  general  for  this  case  also. 


neighborhood  sampled  from  the  image.  To  determine 
optimum  /?  parameter  for  each  neighborhood  we  solve  the 
following  equation  w.r.t.  /? : 


dPx(x\B~P)  _  vd  ak\xk\  c  (xk\  i  v* 1 

2^=i  A 


dp 


A2p 


i*j  ' 


i,j  [/  (j)  sgn(xj) 


f{xj/P)sgn(xi)\  +  Y.U\ 
/?)  -  d  =  0 


2  X2sgn(ip(xk/P')) 


+ 


A1+2A2j(p(xic/p)\ 


ft 


+4A2\xk/p\ 


-(xk/ 


(14) 


Thus  (p  is  generally  invertible  only  for  cases  (1)  and  (2).  This 
establishes  the  lemma. 

From  Lemma  1  we  note  that  by  setting  y1  =  1  and  y2  =  0, 
MCG  reduces  to  the  CG  distribution  as  a  special  case.  The 
following  Lemma  details  a  complementary  direction  in  which 
the  above  model  can  be  extended. 

Lemma  2:  Let  !1(s)  =  y1(s)it1(s)  and  T2(s)  =  Y2(s)u2(s)- 
Then  (p  in  (4)  is  invertible  if  !1(s)!2(s)  =  0. 

Proof:  The  proof  follows  by  noting  that  the  condition 
!1(s)!2(s)  =  0  ensures  that  one  and  only  one  of  the  two 
channels  (i.e.  the  linear  and  quadratic  channels  as  described 
in  II-A)  are  active  for  a  given  value  of  s.  Thus  invertibility  of 
< p  trivially  follows.  This  establishes  the  lemma. 

We  note  that  under  Lemma  2,  MCG  reduces  to  the  MICA 
distribution  [3-4]  which  thus  formally  complements  the  CG 
model.  Thus  the  MCG  model  subsumes  both  the  CG  and 
MICA  distributions  as  special  cases. 

Importantly  the  following  lemma,  whose  proof  is  similar 
to  that  of  the  central  theorem  in  [3],  furnishes  a  closed  form 
expression  for  the  Jacobian  \J  (A)  |: 

Lemma  3:  The  Jacobian  for  the  MCG  model  is: 

\JW\  =  \A\l\LiM(yk)  (13) 

where  t/i(yk)  =  A±  +  2A2\yk\ 

We  can  easily  show  that  Lemma  3  reduces  to  the  central 
theorem  in  [3]  for  the  special  case  of  MICA  distributions.  The 
existence  of  the  above  closed  form  expression  for  \J  ( X )  | 
obviates  the  need  for  Monte-Carlo  simulation  for  the 
estimation  of  system  parameters.  The  following  section  delves 
into  more  details  on  the  algorithmic  aspects  of  parameter 
estimation  for  the  MCG  model. 

III.  MCG  Parameter  Estimation  Algorithm 

Similar  to  the  general  strategy  in  [1],  we  take  a  two-stage 
approach  to  estimating  the  parameters  of  the  MCG  model. 
Given  a  set  of  N  neighborhoods  (of  wavelet  coefficients)  of 
size  MxM  (where  N»M;  for  e.g.  we  choose  N=2000,  M=  3  in 
our  simulations  below)  which  are  organized  in  a  matrix  X,  we 
estimate  the  optimum  parameter  /?  (for  a  given  set  of 
multilinear  parameters)  corresponding  to  each  such  sampled 
neighborhood.  Each  neighborhood  is  then  normalized  by  the 
corresponding  /?  parameter.  Thereafter  the  optimum 
multilinear  parameters  are  determined  in  the  manner  described 
below.  This  process  is  then  iterated  several  times  until 
convergence. 

In  order  to  simply  matters,  we  assume  as  in  [1]  that  B  is  a 
scalar  random  variable.  Let  x  =  [xlt ... ,  xd]T  denote  a 


where,  /(x)  =  since  /?  is  a  scalar  parameter,  we 

can  easily  solve  for  /?  in  a  computationally  tractable  manner 
by  employing  a  brute  force  approach  of  evaluating  (14)  for 
every  value  of  /?  in  a  interval  [0,  max(X)]  and  choosing  the  /? 
that  renders  the  R.H.S.  of  (14)  closest  to  zero.  Furthermore  it 
is  easy  to  verify — both  analytically  and  numerically — that  in 
the  CG  limit  (i.e.  A1  =  1,A2  —>  0)  the  optimum  solution  of 
(14)  is  exactly  the  closed  form  solution  of  P  for  the  GSM  case 
[1]  as  one  would  expect. 

Having  obtained  the  set  of  wavelet  coefficient 
neighborhoods  normalized  by  random  variable  B,  we  now 
detail  how  to  update  the  remaining  parameters  of  the  MCG 
model.  Firstly  in  our  simulations,  consistent  with  the  CG 
model  of  the  wavelet  structure  of  natural  images,  we  set  p  =  0 
and  T  =  0.  Thus  the  parameters  that  remain  to  be  estimated 
are  matrix  C,  vector  a,  and  the  scalars  A1  and  A2.  For 
estimating  C  and  a,  we  employ  a  similar  trick  as  used  in  [4] : 
let  yl  =  tpx(xl / fi)  where  xl  denotes  the  i,h  vector  of 
neighborhood  coefficients,  and  where  the  subscript  denotes 
that  the  non-linear  function  <p  depends  on  A  =  [A1,A2];  then 

set  C  =  VQ  (  where  Q  =  ^Hk=iyk(yk),)>  and  set  to  be  the 
standard  deviation  of  the  ith  component  of  the  normalized  and 
inverted  wavelet  coefficient  neighborhoods  {yk}k.  Note  that 
the  computation  of  C  and  a  assumed  knowledge  of  A,  but  did 
not  involve  any  iterative  procedure  for  estimation. 

To  compute  the  optimum  A  (given  knowledge  of  the  other 
multilinear  parameters)  we  resort  to  an  unconstrained  iterative 
maximum-likelihood  estimation  approach  based  on  the 
following  descent  equations: 

P))<P'(xk/P)]  -  Yi= i  2 akcp(xk/P)  <p'{xk/p )  - 

T.i*j  Gij\<P(xi/P)(p'(xj/p )  +  <p{xj/p)(p'{xi/P)\  (15) 

A  similar  gradient  equation  can  also  be  derived  with  respect  to 
A2.  Thus  the  following  algorithm  summarizes  our  MCG 
parameter  estimation  procedure: 

0)  Initialize  P  according  to  the  CG  model.  From  this 
initialize  C  and  a. 

1)  Re-estimate  P  by  solving  (14) 

2)  Normalize  X  according  to  P  computed  above  and  re- 
estimate  C  and  a  accordingly 

3)  Estimate  the  optimum  A  via  (15).  Re-estimate  C  and 
a  accordingly 

4)  Iterate  (l)-(3)  until  convergence. 
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IV.  Simulation  Results 

The  above  MCG  parameter  estimation  algorithm  was 
applied  to  a  set  of  N=2000  neighborhoods  of  wavelet 
coefficients  of  size  3x3.  We  chose  the  bi-orthogonal  1.1 
wavelet  and  chose  the  horizontal  sub-band  in  our  simulations 
(although  the  results  hold  generally  for  any  sub-band  and 
choice  of  wavelet  filters).  Having  learnt  the  optimal  MCG 
parameters  (including  A)  from  the  training  set,  we 
subsequently  sample  multiple  set  of  N  wavelet  neighborhoods 
and,  for  each  case,  compute  the  KLD  (Kullback  Leibler 
Divergence)  between  the  normalized  empirical  histograms  of 
the  /(-normalized  version  of  each  wavelet  coefficient  and  the 
histogram  as  predicted  by  the  CG  and  MCG  models 
respectively. 

Tables  1-3  show  respectively  the  average  performance  of 
CG  and  MCG  models  for  the  Herringbone  texture  [9],  the 
hangers  SAR  image  [10],  and  the  Baboon  Optical  image.  The 
results  recorded  here  are  statistically  significant.  The  optimum 
value  of  A  for  each  case  is  also  displayed.  Since  AXA2  <  0  for 
the  case  of  Baboon  image,  by  Lemma  1  a  unique  inverse 
cannot  be  guaranteed;  nevertheless  the  generative  algorithm 
furnishes  above  one  possible  solution  that  is  consistent  with 
the  MCG  model.  From  the  results  it  is  clear  that  the  MCG 
distribution  outperforms  CG  in  all  cases. 


structure;  and  more  generally  opens  up  exciting  new  avenues 
for  accurately  modeling  diverse  stochastic  phenomena. 
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Figure  1 


Avg.KLD 

Channel  1 

Channel2 

ChanneB 

Channel4 

Channel5 

Channel6 

Channel7 

Channel8 

Channel9 

CG  (GSM) 

0.5163 

0.5706 

0.4407 

0.4365 

0.4119 

0.4377 

0.4312 

0.5248 

0.4782 

MCG 

0.0266 

0.0335 

0.0226 

0.0471 

0.0611 

0.0455 

0.0320 

0.0340 

0.0326 

Table  lrHerringbone  Texture.  Optimum  A  =  [At  =  0. 6368 ,A2  =  0. 0051].  Each  channel  is  a  different  node  in  the  neighborhood  structure. 


Avg.KLD 

Channel  1 

Channel2 

ChanneB 

Channel4 

ChanneB 

Channel6 

Channel7 

Channel8 

Channel9 

CG  (GSM) 

0.6231 

0.3568 

0.4989 

0.4783 

0.2797 

0.4303 

0.4676 

0.4311 

0.5609 

MCG 

0.0842 

0.0353 

0.0464 

0.0784 

0.0550 

0.0556 

0.0601 

0.0587 

0.0809 

Table  2:  Hangers  SAR  Image.  Optimum  A  =  [A±  =  1.4369,  A2  =  0. 0020].  Each  channel  is  a  different  node  in  the  neighborhood  structure. 


Avg.KLD 

Channel  1 

ChanneB 

ChanneB 

Channel4 

ChanneB 

Channel6 

Channel7 

Channel8 

Channel9 

CG  (GSM) 

0.4036 

0.3087 

0.2989 

0.2514 

0.2756 

0.2872 

0.3322 

0.2966 

0.3174 

MCG 

0.0353 

0.0254 

0.0256 

0.0237 

0.0231 

0.0251 

0.0233 

0.0200 

0.0274 

Table  3:  Baboon  Image.  Optimum  A  =  [At  =  0. 6018,  A2  =  —0. 0092].  Each  channel  is  a  different  node  in  the  neighborhood  structure. 
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